18,740 research outputs found
Inner product computation for sparse iterative solvers on\ud distributed supercomputer
Recent years have witnessed that iterative Krylov methods without re-designing are not suitable for distribute supercomputers because of intensive global communications. It is well accepted that re-engineering Krylov methods for prescribed computer architecture is necessary and important to achieve higher performance and scalability. The paper focuses on simple and practical ways to re-organize Krylov methods and improve their performance for current heterogeneous distributed supercomputers. In construct with most of current software development of Krylov methods which usually focuses on efficient matrix vector multiplications, the paper focuses on the way to compute inner products on supercomputers and explains why inner product computation on current heterogeneous distributed supercomputers is crucial for scalable Krylov methods. Communication complexity analysis shows that how the inner product computation can be the bottleneck of performance of (inner) product-type iterative solvers on distributed supercomputers due to global communications. Principles of reducing such global communications are discussed. The importance of minimizing communications is demonstrated by experiments using up to 900 processors. The experiments were carried on a Dawning 5000A, one of the fastest and earliest heterogeneous supercomputers in the world. Both the analysis and experiments indicates that inner product computation is very likely to be the most challenging kernel for inner product-based iterative solvers to achieve exascale
Minimizing synchronizations in sparse iterative solvers for distributed supercomputers
Eliminating synchronizations is one of the important techniques related to minimizing communications for modern high performance computing. This paper discusses principles of reducing communications due to global synchronizations in sparse iterative solvers on distributed supercomputers. We demonstrates how to minimizing global synchronizations by rescheduling a typical Krylov subspace method. The benefit of minimizing synchronizations is shown in theoretical analysis and is verified by numerical experiments using up to 900 processors. The experiments also show the communication complexity for some structured sparse matrix vector multiplications and global communications in the underlying supercomputers are in the order P1/2.5 and P4/5 respectively, where P is the number of processors and the experiments were carried on a Dawning 5000A
An advanced meshless method for time fractional diffusion equation
Recently, because of the new developments in sustainable engineering and renewable energy, which are usually governed by a series of fractional partial differential equations (FPDEs), the numerical modelling and simulation for fractional calculus are attracting more and more attention from researchers. The current dominant numerical method for modeling FPDE is Finite Difference Method (FDM), which is based on a pre-defined grid leading to inherited issues or shortcomings including difficulty in simulation of problems with the complex problem domain and in using irregularly distributed nodes. Because of its distinguished advantages, the meshless method has good potential in simulation of FPDEs. This paper aims to develop an implicit meshless collocation technique for FPDE. The discrete system of FPDEs is obtained by using the meshless shape functions and the meshless collocation formulation. The stability and convergence of this meshless approach are investigated theoretically and numerically. The numerical examples with regular and irregular nodal distributions are used to validate and investigate accuracy and efficiency of the newly developed meshless formulation. It is concluded that the present meshless formulation is very effective for the modeling and simulation of fractional partial differential equations
Recommended from our members
Value encoding in the globus pallidus: fMRI reveals an interaction effect between reward and dopamine drive
The external part of the globus pallidus (GPe) is a core nucleus of the basal ganglia (BG) whose activity is disrupted under conditions of low dopamine release, as in Parkinson's disease. Current models assume decreased dopamine release in the dorsal striatum results in deactivation of dorsal GPe, which in turn affects motor expression via a regulatory effect on other nuclei of the BG. However, recent studies in healthy and pathological animal models have reported neural dynamics that do not match with this view of the GPe as a relay in the BG circuit. Thus, the computational role of the GPe in the BG is still to be determined. We previously proposed a neural model that revisits the functions of the nuclei of the BG, and this model predicts that GPe encodes values which are amplified under a condition of low striatal dopaminergic drive. To test this prediction, we used an fMRI paradigm involving a within-subject placebo-controlled design, using the dopamine antagonist risperidone, wherein healthy volunteers performed a motor selection and maintenance task under low and high reward conditions. ROI-based fMRI analysis revealed an interaction between reward and dopamine drive manipulations, with increased BOLD activity in GPe in a high compared to low reward condition, and under risperidone compared to placebo. These results confirm the core prediction of our computational model, and provide a new perspective on neural dynamics in the BG and their effects on motor selection and cognitive disorders
A node-based smoothed conforming point interpolation method (NS-CPIM) for elasticity problems
This paper formulates a node-based smoothed conforming point interpolation method (NS-CPIM) for solid mechanics. In the proposed NS-CPIM, the higher order conforming PIM shape functions (CPIM) have been constructed to produce a continuous and piecewise quadratic displacement field over the whole problem domain, whereby the smoothed strain field was obtained through smoothing operation over each smoothing domain associated with domain nodes. The smoothed Galerkin weak form was then developed to create the discretized system equations. Numerical studies have demonstrated the following good properties: NS-CPIM (1) can pass both standard and quadratic patch test; (2) provides an upper bound of strain energy; (3) avoid the volumetric locking; (4) provides the higher accuracy than those in the node-based smoothed schemes of the original PIMs
- …